Improved Models of Distortion Cost for Statistical Machine Translation

نویسندگان

  • Spence Green
  • Michel Galley
  • Christopher D. Manning
چکیده

The distortion cost function used in Mosesstyle machine translation systems has two flaws. First, it does not estimate the future cost of known required moves, thus increasing search errors. Second, all distortion is penalized linearly, even when appropriate reorderings are performed. Because the cost function does not effectively constrain search, translation quality decreases at higher distortion limits, which are often needed when translating between languages of different typologies such as Arabic and English. To address these problems, we introduce a method for estimating future linear distortion cost, and a new discriminative distortion model that predicts word movement during translation. In combination, these extensions give a statistically significant improvement over a baseline distortion parameterization. When we triple the distortion limit, our model achieves a +2.32 BLEU average gain over Moses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation

This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sente...

متن کامل

Distortion Model Considering Rich Context for Statistical Machine Translation

This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a furth...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

NUT-NTT statistical machine translation system for IWSLT 2005

In this paper, we present a novel distortion model for phrase-based statistical machine translation. Unlike the previous phrase distortion models whose role is to simply penalize nonmonotonic alignments[1, 2], the new model assigns the probability of relative position between two source language phrases aligned to the two adjacent target language phrases. The phrase translation probabilities an...

متن کامل

Reordering Models for Statistical Machine Translation: A Literature Survey

In this survey, we briefly study various reordering models that are used with statistical translation models. Reordering model is one of the important component of any statistical machine translation system. Problem of reordering is NP-Hard itself. In this survey, we study various reordering approaches that can be used to solve this problem. We first study simple distortion-based reordering whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010